[RLlib] RLModule API: `SelfSupervisedLossAPI` for RLModules that bring their own loss (algo independent). #47581

sven1977 · 2024-09-10T09:50:12Z

RLModule API: SelfSupervisedLossAPI for RLModules that bring their own loss (algo independent).

Learner now checks whether any RLModule (in MultiRLModule) implements this API and if yes, calls the Module's own compute_self_supervised_loss method (instead of the Learner's compute_loss_for_module() method).
Updated curiosity RLModule to implement this API.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…odule_api_self_supervised_loss

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM.

simonsays1980 · 2024-09-10T12:47:21Z

rllib/algorithms/ppo/ppo_learner.py


    @abc.abstractmethod
    def _update_module_kl_coeff(
        self,
        *,
        module_id: ModuleID,
        config: PPOConfig,
+        kl_loss: float,
    ) -> None:
        """Dynamically update the KL loss coefficients of each module with.


"module with"?

simonsays1980 · 2024-09-10T12:51:32Z

rllib/examples/curiosity/intrinsic_curiosity_model_based_curiosity.py

+            learner_config_dict={
+                # Intrinsic reward coefficient.
+                "intrinsic_reward_coeff": 0.05,
+                # Forward loss weight (vs inverse dynamics loss). Total ICM loss is:


Very nice comment!

simonsays1980 · 2024-09-10T12:56:02Z

rllib/examples/learners/classes/intrinsic_curiosity_learners.py

+
+
+class DQNTorchLearnerWithCuriosity(DQNRainbowTorchLearner):
+    def build(self) -> None:


Dumb question: Can't we just override AlgorithmConfig.build_learner_pipeline()?

We could, but again, this should be done inside the Learner, imo.

But you have a good point: How can we make this even easier for the user? Maybe offer a better way to customize the Learner pipeline? Currently, users can only prepend connector pieces to the beginning, then RLlib adds the default pieces to the end. But here, we need a (custom) connector piece to move all the way to the end, which is not possible with the config.learner_connector property.

simonsays1980 · 2024-09-10T12:58:25Z

rllib/examples/rl_modules/classes/intrinsic_curiosity_model_rlm.py

-                ],
-                dim=0,
-            )
+        obs = tree.map_structure(


simonsays1980 · 2024-09-10T12:58:48Z

rllib/examples/rl_modules/classes/intrinsic_curiosity_model_rlm.py

        *,
        learner: "TorchLearner",
        module_id: ModuleID,
        config: "AlgorithmConfig",
        batch: Dict[str, Any],
        fwd_out: Dict[str, Any],
    ) -> Dict[str, Any]:
-        module = learner.module[module_id]
+        module = learner.module[module_id].unwrapped()


I guess we need this for DDP?

simonsays1980 · 2024-09-10T13:00:33Z

rllib/examples/rl_modules/classes/intrinsic_curiosity_model_rlm.py

-    @staticmethod
-    def compute_loss_for_module(
+    @override(SelfSupervisedLossAPI)
+    def compute_self_supervised_loss(


What somehow irritates me is that we are putting the loss function into the module, but still build a special learner to handle this. Instead we could directly override the learners compute_loss_for_module, couldn't we?

See my answer to your comment above.

…odule_api_self_supervised_loss Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/core/rl_module/apis/__init__.py

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…odule_api_self_supervised_loss

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…g their own loss (algo independent). (ray-project#47581) Signed-off-by: ujjawal-khare <ujjawal.khare@dream11.com>

wip

a075ed9

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested review from ArturNiederfahrenhorst and simonsays1980 as code owners September 10, 2024 09:50

sven1977 assigned simonsays1980 Sep 10, 2024

sven1977 added 4 commits September 10, 2024 11:52

Merge branch 'master' of https://github.com/ray-project/ray into rl_m…

0f2928e

…odule_api_self_supervised_loss

wip

21048ed

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

7a642aa

Signed-off-by: sven1977 <svenmika1977@gmail.com>

LINT

a86c1c2

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980 approved these changes Sep 10, 2024

View reviewed changes

sven1977 added 5 commits September 11, 2024 09:55

Merge branch 'master' of https://github.com/ray-project/ray into rl_m…

07551b2

…odule_api_self_supervised_loss Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/core/rl_module/apis/__init__.py

fix

92c5919

Signed-off-by: sven1977 <svenmika1977@gmail.com>

fix

2b3d3aa

Signed-off-by: sven1977 <svenmika1977@gmail.com>

Merge branch 'master' of https://github.com/ray-project/ray into rl_m…

8713a3c

…odule_api_self_supervised_loss

fix

03258df

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) September 11, 2024 14:02

github-actions bot disabled auto-merge September 11, 2024 14:03

github-actions bot added the go add ONLY when ready to merge, run all tests label Sep 11, 2024

sven1977 enabled auto-merge (squash) September 11, 2024 15:43

sven1977 merged commit f422376 into ray-project:master Sep 11, 2024
7 of 8 checks passed

sven1977 deleted the rl_module_api_self_supervised_loss branch September 12, 2024 06:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] RLModule API: `SelfSupervisedLossAPI` for RLModules that bring their own loss (algo independent). #47581

[RLlib] RLModule API: `SelfSupervisedLossAPI` for RLModules that bring their own loss (algo independent). #47581

sven1977 commented Sep 10, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 Sep 10, 2024

sven1977 Sep 11, 2024

simonsays1980 Sep 10, 2024

simonsays1980 Sep 10, 2024

sven1977 Sep 11, 2024

simonsays1980 Sep 10, 2024

simonsays1980 Sep 10, 2024

sven1977 Sep 11, 2024

simonsays1980 Sep 10, 2024

sven1977 Sep 11, 2024



		class DQNTorchLearnerWithCuriosity(DQNRainbowTorchLearner):
		def build(self) -> None:

[RLlib] RLModule API: SelfSupervisedLossAPI for RLModules that bring their own loss (algo independent). #47581

[RLlib] RLModule API: SelfSupervisedLossAPI for RLModules that bring their own loss (algo independent). #47581

Conversation

sven1977 commented Sep 10, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[RLlib] RLModule API: `SelfSupervisedLossAPI` for RLModules that bring their own loss (algo independent). #47581

[RLlib] RLModule API: `SelfSupervisedLossAPI` for RLModules that bring their own loss (algo independent). #47581

sven1977 commented Sep 10, 2024 •

edited

Loading